license: “CC BY-NC”
Creative Commons: Attribution, Non-Commerical
https://creativecommons.org/licenses/by-nc/4.0/
Find this repository: https://github.com/libjohn/workshop_twitter_analysis
Much of this review comes from Introduction to gathering tweets with rwteet using the rtweet package. Conveniently, you no longer need a Twitter API developer key to use this package. You do need a Twitter account.
search_tweets() - search a keyword(s) or hashtag
I recommend limiting the number of tweets returned (n = 1000) for this training. Otherwise you may hit a rate limit.
#bts <- search_tweets("#BTS", n = 5000, include_rts = FALSE)
bts_dynamite <- search_tweets("#BTS dynamite", n = 1000, include_rts = FALSE)
Requesting token on behalf of user...
Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Authentication complete.
Downloading [=======>---------------------------------] 20%
Downloading [===========>-----------------------------] 30%
Downloading [===============>-------------------------] 40%
Downloading [===================>---------------------] 50%
Downloading [========================>----------------] 60%
Downloading [============================>------------] 70%
Downloading [================================>--------] 80%
Downloading [====================================>----] 90%
Downloading [=========================================] 100%
If you were unable to authenticate with the Twitter API, you may be using an RStudio instance in the cloud. If that is a problem you will need to use the rwteet::create_token() function. Read more about obtaining and using tokens. Specifically, see under the authorization methods, section 2 Access token/secret method
Gathering tweets will return a 90 variable tibble of whatever row-size you were able to collect. You’ll want to spend some time familiarizing yourself with these variables and the range of data that can be gathered.
# bts
bts_dynamite
{r}
glimpse(bts_dynamite)
{r}
my_gathered_tweets <- search_tweets("_________", n = 1000, include_rts = FALSE)
If you’re collecting data into the future, you may want to set your twitter API search to run on a schedule. How you set up your compute structure matters here. One way is to use the Duke VCM cloud computers along with the
Find all the accounts a user follows
john_little <- get_friends("john_little")
This returns the twitter name, i.e. user, and the user_id for each person following that user.
john_little
Next, use lookup_users with the user_id to get more information about those accounts.
john_little_data <- lookup_users(john_little$user_id)
john_little_data
Who is following me? get_followers()
jrl_flw <- get_followers("john_little")
jrl_flw_data <- lookup_users(jrl_flw$user_id)
jrl_flw_data
Get the most recent tweets from an account
rg_tmls <- get_timelines("RhiannonGiddens", n = 3200)
rg_tmls %>%
summarise(min(created_at), max(created_at))
rg_tmls %>%
dplyr::filter(created_at >= "2016-01-01") %>%
dplyr::group_by(screen_name) %>%
ts_plot("weeks", trim = 1L) +
ggplot2::geom_point() +
geom_smooth(se = FALSE, color = "cadetblue") +
colorblindr::scale_color_OkabeIto() +
hrbrthemes::theme_ipsum(grid = "Y") +
ggplot2::theme(
legend.title = ggplot2::element_blank(),
legend.position = "bottom",
plot.title = ggplot2::element_text(face = "bold")
) +
ggplot2::labs(
x = NULL, y = NULL,
title = "Frequency of Twitter statuses",
subtitle = "Twitter status (tweet) counts aggregated by week from Jan. 2016",
caption = "Source: Data collected from Twitter's REST API via rtweet"
)
# ggsave("images/giddens_timeline.png")
Get the most recent favorites from a user
rg_faves <- get_favorites("RhiannonGiddens", n = 3000)
rg_faves
Search a users’ profiles
gullah <- search_users("#gullah", n = 1000)
Searching for users...
Finished collecting users!
gullah
What is trendingin a specific location?
# sf <- get_trends("san franciso")
# durham <- get_trends(lat = 36.0, lng = -78.9)
greensboro <- get_trends("greensboro")
greensboro
Using the tidygeocoder R library, we can find location information when place_names are available.
Use the tidygeocoder package.
# glimpse(rg_tmls)
rg_places <- rg_tmls %>%
drop_na(place_name) %>%
select(place_name:bbox_coords) %>%
distinct() %>%
mutate(addr = glue::glue("{place_full_name}, {country}")) %>%
tidygeocoder::geocode(addr, method = "osm")
rg_places
You can create maps in R. Below is one of the easiest methods, especially if you know ggplot2
rg_places %>%
distinct() %>%
drop_na(lat) %>%
ggplot(aes(long, lat), color="grey99") +
borders("world") +
geom_point(color = "goldenrod") +
ggrepel::geom_label_repel(aes(label = place_full_name),
segment.color = "goldenrod", segment.size = 1,
color = "navy") +
theme_void()
# ggsave("images/giddens_locations_map.png")
Very similar to above. For accounts with “#gullah” in their profile, and that have location information listed, geocode the locations …..
gullah_places <- gullah %>%
drop_na(place_name) %>%
select(place_name:bbox_coords) %>%
filter(country_code == "US") %>%
distinct() %>%
mutate(addr = glue::glue("{place_full_name}, {country}")) %>%
tidygeocoder::geocode(addr, method = "osm")
gullah_places
And now visualize on a US map of the lower 48 states.
You can learn more about basic R mapping from our workshop on mapping with R
gullah_places %>%
distinct() %>%
drop_na(lat) %>%
ggplot(aes(long, lat), color="grey99") +
borders("state") +
geom_point(color = "goldenrod") +
ggrepel::geom_label_repel(aes(label = place_full_name),
segment.color = "goldenrod", segment.size = 1,
color = "navy") +
theme_void()